Replace Missing Values with EM algorithm based on GMM and Naïve Bayesian

نویسندگان

  • Xi-Yu Zhou
  • Joon S. Lim
چکیده

In data mining applications, there are various kinds of missing values in experimental datasets. Non-substitution or inappropriate treatment of missing values has a high probability to cause a lot of warnings or errors. Besides, many classification algorithms are very sensitive to the missing values. Because of these, handling the missing values is an important phase in many classification or data mining task. This paper introduces traditional EM algorithm and disadvantage of the EM algorithm. We propose a new method to implement the missing values based on EM algorithm, which uses Naive Bayesian to improve the accuracy. We conclude by classifying seeds dataset and vertebral columns dataset and comparing the results to those obtained by applying two other missing value handling methods: the traditional EM algorithm and the non-substitution method. The experimental results prove a stable algorithm for improving the data classification accuracy on large datasets, which contain a lot of missing values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Gaussian Mixture Models from Data with Missing Features

Maximum likelihood (ML) tting of Gaussian mixture models (GMMs) to feature data is most e ciently handled by the EM algorithm [1, 2, 3, 4]. The EM algorithm is directly applicable to multivariate data in which all the features are always present, and there are no missing values. Unfortunately, missing values are common: caused either by random or systematic e ects. This study presents a novel a...

متن کامل

Missing value imputation: with application to handwriting data

Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for...

متن کامل

The Bayesian Structural EM Algorithm

In recent years there has been a flurry of works on learning Bayesian networks from data. One of the hard problems in this area is how to effectively learn the structure of a belief network from incomplete data—that is, in the presence of missing values or hidden variables. In a recent paper, I introduced an algorithm called Structural EM that combines the standard Expectation Maximization (EM)...

متن کامل

Bayesian Network Induction With Incomplete Private Data

A Bayesian network is a graphical model for representing probabilistic relationships among a set of variables. It is an important model for business analysis. Bayesian network learning methods have been applied to business analysis where data privacy is not considered. However, how to learn a Bayesian network over private data presents a much greater challenge. In this paper, we develop an appr...

متن کامل

Incremental Learning of Bayesian Networks with Hidden Variables

In this paper, an incremental method for learning Bayesian networks based on evolutionary computing, IEMA, is put forward. IEMA introduces the evolutionary algorithm and EM algorithm into the process of incremental learning, can not only avoid getting into local maxima, but also incrementally learn Bayesian networks with high accuracy in presence of missing values and hidden variables. In addit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014